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Abstract. The standard iterated prisoner's dilemma is an unrealistic model of social behaviour 
because it forces individuals to participate in the interaction. We analyse a model in which players 
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have the option of ending their association. If the payoff for living alone is neither too high nor too 



Q^ ' low then the potential for cooperative behaviour is enhanced. For some parameter values it is also 

possible for a polymorphic population of defectors and conditional cooperators to be stable. 



Introduction 
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The iterated, or repeated, prisoner's dilemma is the most popular model of social interactions 
PP. Since its inception the basic model has been modified in many ways (see Dugatkin 2 for a 
review). However, in all these versions it is assumed that the players must engage in the interaction 
and have no opportunity to end it. This unrealistic feature is just one facet of the more general 
assumption that one particular social interaction may be considered in isolation from all others 
L^ ■ that an individual may face. 

In this paper we use the framework of stochastic games |3] to consider a version of the iterated 
<^ | prisoner's dilemma in which the players may choose to discontinue their association. We assume 



that once the partnership has been dissolved by one or more of the players, then each receives 
the same, fixed, per-period payoff. This is probably the simplest way that an interaction can 
be considered as being dependent on other situations in which individuals find themselves during 
a complex and, at least partly, social life. We will use the standard replicator dynamics @] to 
investigate the effect that the existence of this outside option has on the evolution of cooperative 
^ ■ behaviour in a population of players. 



The Model 

A general stochastic game has three major components: the set of states, the games played in 
each of these states and the (possibly behaviour-dependent) probabilities for transition between 
the states. In our model the states represent the different contexts in which players may interact, 
so we will refer to them as context games. 

We consider an interaction described by the following multi-state, stochastic game. There are 
three possible context-games (states) Go, G\ and G2. The interaction starts with context-game 
Gq. In this game the players make the decision about whether or not they wish to initiate or 



continue an association. The first player and the second player choose between two possible actions: 
A= "associate" or B= "break up". There are no payoffs directly associated with this decision. 
Context-game G\ represents some specific activity in which the individuals can participate together. 
It is modelled by the prisoner's dilemma and the players choose between the possible actions: 
C= "cooperation" or D= "defection". Context-game G2 can be considered as a background state 
representing the situation when there is no interaction or association between the players. There 
is only one possible action: L="be alone". 
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Tablel.The multi-state game. 

In each state the per-period payoffs for the relevant action choices 

are given above and to the left of the diagonal line; 

the behaviour-dependent probabilities of transition to the other states 

are given below the line. 

The actions chosen define both immediate payoffs to the individuals and future transition prob- 
abilities. The immediate payoffs collected by the players are given in table 1. The first entry in 
each payoff pair contains the payoff to the player Pi, who selects the row action, the second is for 
the player P2, who selects the column action. In this paper we are considering an extension of the 
standard iterated prisoner's dilemma, for which the following inequalities hold in G±. 

t>r>p>s>0. 



Transition probabilities, which are determined by the choice of actions are presented in table 1 as 
a set of three numbers. This set of numbers appears in square brackets in each cell of the matrices. 
Here the first, second or third number is, respectively, the probability that context-game Go, G± or 
G2 is played at the next round. The probabilities are defined by the following rules. If context-game 
Go is played and action A = "associate" is chosen by both players, at the next round context-game 
G\ is played; if action B= "break up" is chosen by at least one player, context-game G2 is played at 
the next round with probability 1. Whatever actions are chosen when context-game G\ is played, 



context-game Go is played at the next round with probability 1. If context-game G2 is played, at 
the next round context-game G2 is played again with probability 1. 

We assume that after playing context game Go, players survive to play game Gior G2 (as 
appropriate) with probability 1. After playing context games G\ or G2 players survive to the next 
round with probability (3 (0 < j3 < 1). In principle, these survival probabilities could be different 
but, for simplicity, we will assume they are equal. 

As with the iterated prisoner's dilemma there is an infinite number of pure strategies that could 
be considered. We will initially restrict our attention to the following three strategies. 

• Conditional cooperation (which we denote o"c)- A player following this strategy will initially 
"Associate" in Go then "Cooperate" in G±; if this behaviour is reciprocated then the player 
will continue to associate and cooperate; otherwise it will choose "Break up" in Go- 

• Defection (which we denote ctd). A player following this rather pathological strategy will 
"Associate" in Go and then "Defect" in G±. 

• An unsociable strategy (which we denote &b)- A player following this strategy will "Break 
up" in Go . Strictly speaking this is a set of strategies since any behaviour is allowed in G\ . 
However, since we do not consider the possibility that players make errors, the behaviour 
in G\ does not affect payoffs. Consequently we ignore this technicality. 

The consequences of introducing a fourth strategy of unconditional cooperation will be considered 
later. 

Evolutionary Dynamics 

We set up the evolutionary dynamics by considering an infinitely large population of individuals 
who adopt one of the three pure strategies. The payoffs in the repeated game, ir(a, a') for adopting 
strategy a against an opponent who adopts strategy a' are 
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Let x\ and X2 be the proportions of individuals who adopt (Jq and <j£> respectively. The propor- 
tion of individuals using 03 is then 1 — x\ — X2- The standard replicator dynamics [I] is then two 
equations describing the evolution of a point x = (xi, X2) in the domain 

A = {( Xl , x 2 ) : (xi > 0) n (x 2 > 0) n (xi + x 2 < 1)} . (2) 

Denote 

a = (z — r) — 7 (z — t) ; b = 7 (z — s) — (z — p) ; 

c = (z — r); f = 7 (z — s) where 7=1-/3. 
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Then the Replicator Dynamics can be written as the following system of equations. 

x\ = x\ {cx\ + (f + c - a) xix 2 + (/ — b) x\ - cx 1 - fx 2 ) 

x 2 = x 2 (cx\ + (/ + c - a) x\x 2 + (/ — b) x\ + (a - c) x\ + (6 - /) x 2 ) 

Although this system is integrable for arbitrary choices of parameter values [?] , the general solution 
given in appendix A is not easy to work with. We will now introduce a commonly used set of values 
for the prisoner's dilemma context game G\ to reduce the number of parameters, and we will use 
the standard linearization approach to study how the solution depends on the value of the outside 
option, z, and the survival probability, (3. Accordingly we set 
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The payoff matrix then becomes 
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and the Replicator Dynamics is as follows. 

*i = T^ (0 - 3) x\ + (1 - (3) (2z - 5) xix 2 + (z - 1) x\ + (3 - z) x x + (fiz - z) x 2 ) 

X2 = T^3 {( z ~ 3) x\ + (1 - 0) (2z - 5) Xl x 2 + (z - 1) x\ + (1 - /?) (5 - z) x x + (1 - z) x 2 ) 

There are four fixed points for this Dynamics and a standard linearization analysis produces the 
results shown in table 2. 
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Table 2. Eigenvalues and eigenvectors for the fixed points 
in the replicator dynamics system given by equations ©• 



Depending on the values of the parameters z and (3 we obtain different solutions for the dy- 
namics (J3J). The (3 — z parameter space can be divided into 10 regions (see figure 1) which have 



qualitatively different pictures of the dynamics (see figure 2). The main features of this overall pic- 
ture can be summarized as follows. If z > 3 then the population evolves towards a monomorphic 
state in which every player uses the unsociable strategy as- If 2 < 1 then the picture resembles the 
iterated prisoner's dilemma: if fi is small then defection is stable, but if (3 is large populations us- 
ing either defection or conditional cooperation are asymptotically stable and the population which 
arises depends on the initial conditions. The most interesting dynamics occur when (3 is large and 
1 < z < 3 (labelled as regions VI to IX in figure 1). If f3 is large and z < 7 then conditional 
cooperation is the only asymptotically stable behaviour, and in region VII this is the endpoint of 
all trajectories which start in the interior of the simplex. In region VI a polymorphic population is 
stable: a proportion of players, x, use the conditional cooperative strategy and a proportion, 1 — x, 
defect where x = i + 28z-58 • ^ n ^ n ^ s population the proportion of individuals that would be observed 
in cooperative partnerships is x 2 ; the proportion of individuals involved in partnerships for which 



mutual defection was the norm would be (1 
be living alone. 
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and a proportion 2x(l — x) of individuals would 



Introducing unconditional cooperators 

In region VII of the /3 — z parameter space we have found that conditional cooperation is asymp- 
totically stable. It is pertinent to ask whether this property would be destroyed if we allowed 
individuals to use the "sucker" strategy of unconditional cooperation (which we denote as)- Recall 
that in the iterated prisoner's dilemma, tit-for-tat is not asymptotically stable due to the pres- 
ence of unconditional cooperators. Similarly, it is conceivable that the polymorphic population 
which is stable in region VI could be destabilized by the introduction of a strategy of unconditional 
cooperation. 

We introduce a proportion x% of players who use the strategy of unconditional cooperation, erg. 
These players associate in Go and cooperate in Gi whatever their opponent does. (The proportion 
of individuals using as is then 1 — x\ — X2 — X3.) The new payoff matrix is given by 

^{pc-,^c) 7>"(cc, 0\d) 7r(o"c,o"s) Tr(ac,a B ) 

ni.o'DjO'c) Tr(aD,0D) ^(o-d,cts) ir(a D ,aB) 
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(4) 
An analysis of the corresponding Replicator dynamics leads to the dynamics shown in figures 
3 and 4 for regions VI and VII respectively (see appendix B for details). These figures show the 
dynamics for particular values of z and (3 but the pictures are qualitatively similar for any values 
of these parameters in the appropriate range. From these figures we can see that the polymorphic 
population remains asymptotically stable in region VI. In region VII, the population of conditional 
cooperators is no longer asymptotically stable. However, populations which consist of mixtures of 
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Figure 1 . Regions for the parameters z and (3 which lead to qualitatively different pictures 
of the evolutionary dynamics. 

conditional and unconditional cooperators are the only end points of all solution trajectories which 
start in the interior of the simplex. 



Discussion 

A minimal version of the iterated prisoner's dilemma deals with a population consisting of un- 
conditional cooperators, unconditional defectors and conditional cooperators (such as tit-for-tat). 
In that model there is a threshold problem: cooperative behaviour only evolves if the initial pro- 
portion of conditional cooperators exceeds some value [SJ[2]. Although it is sometimes suggested 
that the always defect strategy is an ESS or that the corresponding population is asymptotically 
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Figure 2. Qualitative pictures of the dynamics in the different regions of the j3 — z pa- 
rameter space. The regions are labelled according to figure 1. Point a corresponds to a 
population which consists of 100% of players using strategy erg- Point b corresponds to a 
population which consists of 100% of players using strategy ac- Point c corresponds to a 
population which consists of 100% of players using strategy ao- Point d corresponds to a 
polymorphic population with players using either ac or ao- Asymptotically stable points 
are shown as solid circles, the other fixed points are shown as open circles. 



stable, this is not the case. If sufficiently many varied strategies are introduced then the barrier 
can be removed [Oj. 
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Figure 3. Qualitative picture of the dynamics for the replicator system with payoff matrix 
given by equation J3J. Vertex S corresponds to a population which consists of 100% of 
players using strategy as- Vertex C corresponds to a population which consists of 100% of 
players using strategy ac- Vertex D corresponds to a population which consists of 100% of 
players using strategy ao- The unlabelled vertex corresponds to a population in which all 
players live alone. 



We have introduced an outside option into the iterated prisoner's dilemma, which allows indi- 
viduals to avoid being condemned to maintain an unprofitable interaction of permanent mutual 
defection. This provides another way of removing the barrier to the evolution of cooperative be- 
haviour. The requirement is that the payoff from the outside option should be neither so poor that 
it is irrelevant nor so high that everyone opts for a solitary existence. The existence of the outside 
option also admits a range of parameter values for which a polymorphic population involving de- 
fectors and conditional cooperators is asymptotically stable, even in the presence of unconditional 
cooperators. 

Some of the results we have obtained are similar to those obtained for optional public good 
games, which are multi-player generalizations of the prisoner's dilemma 7 . In these games, as 
in ours, making participation voluntary enhances the possibilities for cooperation. One difference 
between the two models is that in the optional public good game rock-scissors-paper style cycles 
may occur. In our model, such cyclic behaviour does not arise. However, in both models the fixed 
point representing non-participatory behaviour may be non-hyperbolic. This leads to periods of 
cooperative behaviour, but eventually the population returns to a state in which everyone lives 
alone. 



The iterated prisoner's dilemma is an unrealistic model of social interactions because it treats 
one type of interaction between individuals in isolation from all others. We have shown, by means 
of a relatively simple example, that the methods of stochastic game theory can be employed to 
overcome this restriction. The prisoners dilemma has also been criticized as being an unrealistic 
model of social interactions on other grounds [Hj. Our approach is not specific to the prisoner's 
dilemma. That context game may be replaced by any other game or, indeed, a game which is 
randomly selected with a known probability from a set of games 9 . This allows quite complex 
social behaviour to be analyzed. 



Appendix A 

To integrate the Replicator Dynamics system (jHJ) we make the following coordinate substitutions. 
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Solution trajectories can be found by integrating 
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where C is a constant that depends on the initial conditions. Finally, substituting the expressions 
for k and I into the above formula, we find that the solution trajectories are described by the 
expression. 
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Appendix B 
The Replicator Dynamics with payoff matrix ffl is given by the following system of equations. 
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The fixed points together with their associated eigenvectors and eigenvalues are given in table 3. 
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Table 3. Eigenvalues and eigenvectors for the fixed points 
of the replicator system with payoff matrix given by equation (4) . 
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